Developing an Unsupervised Grammar Checker for Filipino Using Hybrid N-grams as Grammar Rules
نویسندگان
چکیده
This study focuses on using hybrid n-grams as grammar rules for detecting grammatical errors and providing corrections in Filipino. These grammar rules are derived from grammatically-correct and tagged texts which are made up of part-of-speech (POS) tags, lemmas, and surface words sequences. Due to the structure of the rules used by this system, it presents an opportunity to have an unsupervised grammar checker for Filipino when coupled with existing POS taggers and morphological analyzers. The approach is also customized to cover different error types present in the Filipino language. The system achieved 82% accuracy when tested on checking erroneous and error-free texts.
منابع مشابه
Google Books N-gram Corpus used as a Grammar Checker
In this research we explore the possibility of using a large n-gram corpus (Google Books) to derive lexical transition probabilities from the frequency of word n-grams and then use them to check and suggest corrections in a target text without the need for grammar rules. We conduct several experiments in Spanish, although our conclusions also reach other languages since the procedure is corpus-...
متن کاملA Greedy Approach to Unsupervised Grammar Induction for Filipino
Copyright 2008 ABSTRACT This paper discusses the Greedy Merge Model used for an unsupervised grammar induction system for the Filipino language. The approach attempts to address the current state of Philippine linguistic resources, specifically the formal grammars, which are insubstantial for robust analysis. The Greedy Merge Model results show an F1 measure of 69%. Generated grammar rules are ...
متن کاملCreating Algorithmic Symbols to Enhance Learning English Grammar
This paper introduces a set of English grammar symbols that the author has developed to enhance students’ understanding and consequently, application of the English grammar rules. A pretest-posttest control-group design was carried out in which the samples were students in two girls’ senior high schools (N=135, P ≤ 0.05) divided into two groups: the Treatment which received gramm...
متن کاملImproving CoGrOO: the Brazilian Portuguese Grammar Checker
This paper highlights the main results obtained in an effort to improve the grammar checker CoGrOO, a hybrid system which initially annotates the text using statistical Natural Language Processing (NLP) techniques, and then apply a rule-based analysis to identify possible grammar errors. The goal was to reduce omissions and false alarms while improving true positives without adding new error ru...
متن کاملPart of Speech Induction from Distributional Features: Balancing Vocabulary and Context
Past research on grammar induction has found promising results in predicting parts-of-speech from n-grams using a fixed vocabulary and a fixed context. In this study, we investigated grammar induction whereby we varied vocabulary size and context size. Results indicated that as context increased for a fixed vocabulary, overall accuracy initially increased but then leveled off. Importantly, this...
متن کامل